As the world population continues to increase, housing crises are rising all over the world. The city with by far the highest degree of housing shortages is Hong Kong, China. However, Hong Kong is also the city with the highest level of tourism. As such, it becomes an interesting setting for an exploration of short-term rental prices.
Airbnb is a short-term rental company which specializes in vacation rentals. Open data can be found with detailed listings data from most major cities across the world, including Hong Kong.
For this exploration, the following will be covered:
The purpose is to determine potential trends in the vacation rental market in cities such as Hong Kong in which its own residents struggle to find housing.
We will begin by loading the data, which was downloaded from Inside Airbnb, which is an initiative intented to make Airbnb data open and transparent to the public, including customers and hosts alike. The data was last updated on 17 September, 2023. As listed on the Inside Airbnb website, “The data behind the Inside Airbnb site is sourced from publicly available information from the Airbnb site. The data has been analyzed, cleansed and aggregated to faciliate public discussion.” That said, the data will still require further cleaning, which can be followed below.
Although the data has been pre-cleaned, it requires further cleaning for the purpose of this study. To begin with, there are many variables; some of which, such as date of last data scrape, are unnecessary to the task at hand. As such, these variables have been omitted from the dataset. In addition, once the graphs have been plotted, categorical variables were then encoded as dummy variables.
Let us first explore the price distribution. This will give us a good estimate of the vacation rental market in Hong Kong. For reference, as found via InterNations, the average monthly rent for permanent residents of Hong Kong in the year 2023 is between 1,500 to 2,500 USD. This comes out at an average of approximately 67 USD per night.
We can see that, in fact, the average listing price per night is approximately $850. This is despite the fact that the distribution seems to be skewed to the left, in which most prices are less than $500, and many are listed at approximately $100 per night. As such, we can assume that the data is skewed by outliers. Still, the average nightly price far exceeds the average rent price for a permanent resident of Hong Kong. This suggests that high demand from tourists of wealthier countries may play a role in the housing crisis.
We will further explore this topic with a more in-depth view of the listings and hosts, as well as with a connection to census data collected by the city of Hong Kong in the year 2023.
First of all, let us consider the properties listed. In this case, there are two variables included in the dataset: one describing Property Types, and one describing Room Types. The two have shared levels, and in fact Room Types is a clustered equivalent of Property Types. In the interest of simplicity and ease of interpretation, we will use the Room Types variable in our analysis.
To begin with, we will determine the frequency of listings belonging to each Room Type, which is categorized as being either a 1. private room, 2. entire home/apt, 3. shared room, or 4. hotel room. For reference, the Property Type variable split the levels even further, including “private room in B&B,” “private room in hostel,” etc. The frequency of each of the four levels will be found using a simple vertical bar chart with frequency of each room type in the listing data.
Our findings suggest that the majority of listings can be categorized as private rooms, with the next most frequent category being entire homes or apartments. Given additional data cleanup with the use of the Property Types variable, an interesting next step would be to split the levels further and evaluate whether private rooms are more often found in permanent residents’ personal homes, in hostels, bed and breakfasts, etc. Particularly, if a large portion of private rooms are found in personal homes, then it may be a key insight into the relationship between the vacation rental market and housing crises in Hong Kong.
In addition to the Property and Room Types, a list of amenities can also be found for each listing in the dataset. Although this next analysis does not provide much insight into the topic of housing crises, let us a consider a word cloud of the most-often listed amenities.
Our findings report that the most common amenities are [air] conditioning and wifi. Another prevalent word, “allowed,” may refer to children or pets. In the dataset, we were also given columns of the listings’ descriptions and names. These columns were omitted from the analysis due to a lack of datacleanup. It is likely the data was collected using a scraping of the html on Airbnb’s website, as there are remnants of html code in the columns.
In future study, however, when such an in-depth datacleanup would be warrented, then the description would be an interesting column to explore via text analysis. This way, we may see what the most popular “attractions” are; that is, we will gain insights into what tourists value in a listing, whether that be proximity to restaurants, public transportation, etc.
Now let us explore the distribution of listings across latitude and longitude coordinates, grouped by neighbourhood. We will do this using a scatterplot in a manner similar to a heatmap, only in this case it is color coordinated by neighborhood and distributed by latitude and longitude.
Based on the graph above we find that the most listings can be found clustered at the following neighbourhoods: Islands, Eastern, Central and Western, Wan Chai, Wong Tai Sin, and Kwai Tsing, among a few others. That being said, this heatmap-style graph is only meant to be used as a preliminary exploration of the neighbourhood clusters.
A more favorable map can be constructed to better determine the distribution of listings across the neighbourhoods of Hong Kong. To do this, I have used OpenStreetMap to construct an interactive background map with the inclusion of color-coordinated indications of listings and their price per night in comparison to other listings in the dataset.
In combination with the heatmap, we can now identify the precise clustering of listings in the given neighbourhoods of Hong Kong as well as their price per night (in USD).
Let us explore the neighbourhoods at a greater depth by looking at average price in the most and least expensive neighbourhoods.
First beginning with the most expensive neighbourhoods:
We see that Islands, Kwai Tsing, Wong Tai Sin, and Southern–all with high clusters of listings–are each among the most expensive neighbourhoods.
Let’s explore the neighbourhoods by average price once more, now looking at the least expensive neighbourhoods.
Here we see some of the other neighbourhoods with high clusters of listings: Wan Chai, Eastern, and Kwai Tsing.
Let us compare this to the population density of the districts of Hong Kong, as produced in a graph by the Census and Statistics Department (C&SD) of Hong Kong.
Here we see that the neighbourhood of Sham Shui Po has both a high population density and a high number of Airbnb listings. We now also know that Sham Shui Po is, in fact, the least expensive neighbourhood in terms of mean Airbnb listing price. We can observe the same fact with the neighbourhood Yau Tsim Mong, which also has a high population density and high number of Airbnb, paired with having one of the lowest mean listing price.
At this point, a next step in the research would be to run an analysis by grouping the property types by neighbourhood and observing the frequency distribution.
However, before conducting any further analysis, let us consider the housing of the Hong Kong population. To explore this, I have included a second graph from the 2021 Hong Kong Population Census conducted by the Census and Statistics Department (C&SD).
This particular graph may deserve some critiques in the usefulness of the data. There seems to be little change from the start of the decade to the end, so the purpose of the comparison is not very clear. Regardless, we can focus on the 2021 side to see that the majority (53%) of the Hong Kong population lives in private permanent housing. This exact data can be found in greater detail on Wikipedia, which covers each category of housing. There we find that “private permanent housing” denotes housing of two categories:
Then we find that the next most popular category (with 29% of the population) of housing is public rental housing, which are owned by the Hong Kong Housing Authority and Hong Kong Housing Society.
The third most popular category (with 16% of the poulation) is subsidized home ownership housing, which is denotes units purchased by tenants with “alienation restrictions.”
We can assume that the majority of private rooms listed on Airbnb are part of the 53% in private permanent housing. This is supported by the dataset column denoting Property Types. We can observe the different types in the dataset as follows:
By sorting “Count” from largest to smallest, we see that the most common property listing type is “Private room in rental unit.” As such, there is evidence to support that housing in Hong Kong may very well be impacted by the Airbnb vacation and short-term rental market. However, in order to draw any concrete conclusions, more research would be required to determine the proportion of households in private permanent housing who rent out one or more rooms and perhaps the income they receive from that in proportion to their salary and the cost of living (i.e. rent).
Due to the scope of the project at hand, we will consider this a satisfactory conclusion, leaving room for further analysis in the future. I may, for example, continue the project over the course of the following month for the use of interview material.
Until then, we can conclude that there is certainly a potential relationship between the housing situation of Hong Kong and the Airbnb listing distribution.
We have determined that many of the cheapest neighbourhoods have both among the highest number of Airbnb listings and the highest population density alike. Yet let us now consider the neighbourhoods with more expensive listings. Recall that the most expensive neighborhoods in terms of mean listing price were Tsuen Wan, Tuen Mun, Sai Kung, and Southern. Now, if we consider the Population Density graph from the C&SD of Hong Kong, we know that each of these neighbourhoods are some of the least populated districts, with each being at a population density of 10,000 persons or fewer per square kilometre. Most, in fact, have a density of 5,000 persons or fewer per square kilometre.
The locations in which the hosts are from could also be a confounding variable in terms of listing price. Perhaps, for example, the hosts in these expensive neighbourhoods are more likely to be foreigners than a family renting out a room in their house. As such, let us explore the distribution of locations from which hosts come from. We will do this using a bar chart detailing the relationship between average listing prices and a host’s country of origin.
Indeed, our guess was correct; the most expensive listings are owned by people foreign to Hong Kong, with the addition of one particularly wealthy host from Jassans-Riottier, France. Let us look at the data again with the top 10 hosts, omitting the host from Jassans-Riottier, as we can consider it an outlier.
Now we can see a more reasonable distribution of average prices, more in the thousands rather than that one value of $50,000. I will note that there are some very expensive listings by people from Hong Kong as well; in fact, there is one valued at $150,000! However, the majority of very cheap listings has weighed the data downwards so that hosts from Hong Kong do not appear in the graphs above.
Now let us begin examining the relationships between data. We will start with a basic regression exploring the relationship between the listing price and the number of listings a host has in total. This is based on the reasonable assumption that, the more money a host has, it is likely that they own more vacation rental properties.
| price | |
| host_listings_count | -2.0963*** |
| (0.2166) | |
| N | 6,735 |
| R2 | 0.0137 |
| Notes: | ***Significant at the 1 percent level. |
| **Significant at the 5 percent level. | |
| *Significant at the 10 percent level. | |
_*Note that the factor host_listings_count is statistically significant, which is denoted by the three asterisks._
_**
In fact, we find that there is a negative relationship between listing price and the number of listings a host has! It seems that, in reality, the more vacation rental properties a host has, the cheaper each listing will be. This brings to light the potential issue that there are some very wealthy hosts who are dominating the market. This creates an issue similar to the current one in Hawai’i, for example, where a number of wealthy hosts who are not native Hawai’ians.
This would be another interesting factor to take into account when conducting further studies. It could even be a topic for a more Behavioral and Experimental Economics type of study, in which case we include the consideration of policies existing in Hong Kong around the short-term rental market as well as the behavior of hosts and the general population of Hong Kong in response.
For the final topic of this blog post, I wanted to take a data science appraoch to explore a more complex scenario, in which we identify some of the predictor variables of listing price. This way, we can also explore confounding variables and reflect on what data may be missing. To do this in a manner similar to concepts covered in Econometrics, we will use a multi-linear regression as opposed to other Machine Learning Algorithms.
All the variables used in the regression were found using backwards elimination, which is a method in which we begin by using every variable in the regression and–one by one–eliminating the least significant variable until all are significant.
| price | |
| room_type_entire | 552.9185*** |
| (90.6002) | |
| host_total_listings_count | -0.9505*** |
| (0.2232) | |
| accommodates | 197.2758*** |
| (21.2648) | |
| maximum_nights | -0.3597*** |
| (0.0873) | |
| availability_30 | 15.1325*** |
| (3.5715) | |
| neighbourhood_cleansed_Islands | -500.4484*** |
| (178.7812) | |
| neighbourhood_cleansed_Tuen_Mun | 2,400.0470*** |
| (486.5980) | |
| neighbourhood_cleansed_Wan_Chai | -235.8713** |
| (100.0169) | |
| neighbourhood_cleansed_Yau_Tsim_Mong | -224.4585** |
| (91.7275) | |
| N | 6,735 |
| R2 | 0.0473 |
| Notes: | ***Significant at the 1 percent level. |
| **Significant at the 5 percent level. | |
| *Significant at the 10 percent level. | |
We find that the most significant predictors of price in Hong Kong are whether the vacation rental is an entire house or apartment, the total number of listings a host has, how many guests the listing can accomodate, the maximum number of nights offered, how many days the rental is available over the next 30 days, and whether the rental is found in the following neighbourhoods: Islands, Tuen Mun, Wan Chai, or Yau Tsim Mong.
If we recall, Islands and Tuen Mun are some of the most expensive neighbourhoods, and Yau Tsim Mong and Wan Chai are some of the cheapest. This is certainly an interesting point that could be further explored in additional analyses, as well as something can can be replicated in cities with similar situations of high levels of tourism with concurrent housing crises. Such cities include Sydney, Australia and New York, USA, as examples.
We have conducted a preliminary yet in-depth analysis into the relationship between the Airbnb market and the housing situation in Hong Kong. This is an excellent first city to begin with for such analyses given Hong Kong’s high tourist levels and densly-populated, highly-demanded housing market.
Although drawing concrete conclusions would require a far more in-depth study using multiple datasets, we have derived the following hypotheses:
The cheapest neighbourhoods for Airbnb rentals are also located in the most densely-populated neighbourhoods; this may have real-world implications that suggests part of the housing crisis may be due to short-term and vacation rentals dominating the housing market, particularly in cheap areas which would theoretically be where most of the population resides.
Many of the listings are private rooms. In addition, the majority of Hong Kong residents live in private permanent housing. A next step for this would be to explore whether many residents rely on renting out parts of their dwellings to make rent.
Many of the hosts with the most expensive listings on average are foreigners. In this case, we can further explore the similarities between the housing crisis in Hong Kong and the one in Hawai’i, evaluating the level of impact that tourism and foreign house hunters have on the crisis.
Similarly to the above finding, we have identified that the more vacation rental properties a host has, the cheaper each listing will be. This suggests that the housing market may also be impacted by a general domination of short-term and vacation rentals over long-term housing. If any findings are drawn from this point in particular, a key next-step would be to explore potential housing policies for specific neighbourhoods to favor long-term housing.
In general, the additional variable of profits would be extremely useful for additional analysis. Yet, in the given data, we have found that neighbourhoods and the general affluence of the hosts are important variables for consideration. Vacation rental type, which is covered by the variable room_type, as well as size, covered by accommodates (meaning how many guests a rental can accommodate) are two more obvious variables of importance, but this exploration supported the belief.
With additional data and time, it is worth exploring datasets covering income levels and general census data about the long-term and permanent residents of Hong Kong as a more in-depth point of comparison. Regardless, it is my hope that this blog post piqued your interest in the vacation rental markets and their relationship to the housing crisis in Hong Kong. A more in-depth study could easily be replicated in cities across the world, with the potential for insights for policy-making.